-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize vulnerability host counts #24914
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #24914 +/- ##
==========================================
- Coverage 63.55% 63.55% -0.01%
==========================================
Files 1618 1618
Lines 154469 154562 +93
Branches 4037 4037
==========================================
+ Hits 98180 98238 +58
- Misses 48553 48581 +28
- Partials 7736 7743 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
In my opinion, we should not use concurrent batches for this DB access because it will put an extra load on the DB reader. We know that the vulnerability cron uses a lot of CPU/DB resources, so the goal should be to smooth out the performance spikes. One way to do that is to pause for some time (500ms?) between each batch. |
Counterpoint to the above: smaller batches should help DB load massively, and we can expose an override for concurrency as an env var (with a default that we've confirmed as working in load test) so customers can tune how hard the tool hits the DB. My current guess is that this will be significantly lighter on the DB in load test, even concurrently, than the old massive temp table method (and I think we'll get useful info from loadtest here), and with the env var in place we can tune things easily enough once this hits production workloads. |
Also, given that we're talking about 5 concurrent sets of queries, if someone is going over prepared statement maximums based on these changes:
|
Started a Slack thread about which data sets to use to test this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only feedback here is on the routines naming. I also see where the host count is bugged but I'll fix that in a PR stacked on top of this one.
server/config/config.go
Outdated
@@ -1257,6 +1258,11 @@ func (man Manager) addConfigs() { | |||
false, | |||
"Don't sync installed Windows updates nor perform Windows OS vulnerability processing.", | |||
) | |||
man.addConfigInt( | |||
"vulnerabilities.max_routines", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"vulnerabilities.max_routines", | |
"vulnerabilities.max_concurrency", |
Seems like "concurrency" is more self-evident here. Guessing you cycled through that as a naming idea here, so it'd be useful to understand why this naming convention won.
@mostlikelee Long-running PR alert! |
0d43ea6
to
6177f13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in approving this, vs. when this commit landed.
#22364
Batching selects aggregating host counts based on CVE and adding concurrency.
changes/
,orbit/changes/
oree/fleetd-chrome/changes
.See Changes files for more information.
SELECT *
is avoided, SQL injection is prevented (using placeholders for values in statements)